(a) (b) (c)

a) The ROC curve of the random forest model constructed for the breast cancer

AUC was 0.99. (b) The ROC curve using the randomForest package for the

ata without any encoding. The AUC was 0.878. (c) The ROC curve using the

kage for the factor Xa protease cleavage data without any encoding. The AUC

50. A tree developed using the party package for the breast cancer data.

dealing with a protease cleavage pattern discovery problem, in

eptides are non-numerical, there are two ways to construct a

forest model. Using raw peptides as input is one way, which is

bove and in Figure 3.49(b) and Figure 3.49(c). In such a random

odel, the residue variables can be ranked in addition to a

ation model.

ver, if it is required to discover which peptides are the most

nt ones for discriminating between cleaved and non-cleaved

the random forest algorithm can be used to rank cleaved peptides